--- title: "Using a delay-adjusted case fatality ratio to estimate under-reporting" description: "Using a corrected case fatality ratio, we calculate estimates of the level of under-reporting for any country with greater than ten deaths" status: real-time-report rmarkdown_html_fragment: true update: 2020-05-19 authors: - id: tim_russell corresponding: true - id: joel_hellewell equal: 1 - id: sam_abbott equal: 1 - id: nick_golding - id: hamish_gibbs - id: chris_jarvis - id: kevin_vanzandvoort - id: ncov-group - id: stefan_flasche - id: roz_eggo - id: john_edmunds - id: adam_kucharski ---

Aim

To estimate the percentage of symptomatic COVID-19 cases reported in different countries using case fatality ratio estimates based on data from the ECDC, correcting for delays between confirmation-and-death.

Methods Summary

Current estimates for percentage of symptomatic cases reported for countries with greater than ten deaths

Temporal variation

Figure 1: Temporal variation in reporting rate. We calculate the percentage of symptomatic cases reported on each day a country has had more than ten deaths. We then fit a Gaussian Process (GP) to these data (see Temporal variation model fitting section for details), highlighting the temporal trend of each countries reporting rate. The red shaded region is the 95% CrI of fitted GP.

Adjusted symptomatic case estimates

Figure 2: Estimated number of new symptomatic cases, calculated using our temporal under-reporting estimates. We adjust the reported case numbers each day - for each country with an under-reporting estimate - using our temporal under-reporting estimates to arrive at an estimate of the true number of symptomatic cases each day. The shaded blue region represents the 95% CrI, calcuated directly using the 95% CrI of the temporal under-reporting estimate.

Reported cases

Figure 3: Reported number of cases each day, pulled from the ECDC and plotted against time for comparison with our estimated true numbers of symptomatic cases each day, adjusted using our under-reporting estimates.

Table of current estimates

Country Percentage of symptomatic cases reported (95% CI) Total cases Total deaths
Afghanistan 44% (31%-60%) 7,072 173
Albania 81% (42%-100%) 948 31
Algeria 33% (21%-48%) 7,201 555
Andorra 22% (12%-37%) 761 51
Argentina 20% (15%-24%) 8,358 382
Armenia 78% (54%-99%) 4,823 61
Australia 86% (53%-100%) 7,060 99
Austria 45% (28%-70%) 16,179 629
Azerbaijan 85% (59%-100%) 3,287 40
Bahamas 50% (12%-98%) 96 11
Bangladesh 82% (44%-100%) 23,870 349
Belarus 99% (93%-100%) 30,572 171
Belgium 12% (8.8%-15%) 55,559 9,080
Bolivia 19% (14%-25%) 4,263 174
Bosnia and Herzegovina 12% (6.8%-19%) 2,303 132
Brazil 13% (10%-15%) 254,220 16,792
Bulgaria 27% (18%-41%) 2,259 112
Burkina Faso 25% (15%-46%) 796 51
Cameroon 33% (16%-58%) 3,292 140
Canada 14% (11%-16%) 78,061 5,842
Chad 7.3% (4.1%-13%) 519 53
Chile 93% (79%-100%) 46,059 478
China 98% (70%-100%) 84,063 4,638
Colombia 29% (23%-37%) 16,295 592
Congo 30% (9.7%-74%) 412 15
Cote dIvoire 83% (53%-100%) 2,119 28
Croatia 19% (10%-31%) 2,228 95
Cuba 35% (22%-57%) 1,881 79
Cyprus 83% (48%-100%) 917 17
Czechia 33% (24%-43%) 8,586 297
Democratic Republic of the Congo 34% (19%-57%) 1,629 61
Denmark 34% (24%-46%) 10,968 548
Dominican Republic 50% (37%-65%) 12,725 434
Ecuador 8.9% (6.7%-11%) 33,582 2,799
Egypt 34% (26%-43%) 12,764 645
El Salvador 49% (27%-86%) 1,413 30
Estonia 33% (21%-51%) 1,784 64
Finland 32% (19%-50%) 6,380 300
France 7.9% (6.4%-9.4%) 142,903 28,239
Germany 21% (17%-26%) 175,210 8,007
Ghana 96% (83%-100%) 5,735 29
Greece 29% (18%-44%) 2,836 165
Guatemala 46% (28%-76%) 2,001 38
Guernsey 43% (13%-94%) 252 13
Guinea 90% (65%-100%) 2,796 16
Guyana 53% (11%-99%) 124 10
Haiti 8.6% (4.4%-16%) 533 21
Honduras 20% (13%-33%) 2,798 146
Hungary 12% (8.1%-16%) 3,556 467
Iceland 87% (55%-100%) 1,802 10
India 37% (30%-45%) 101,139 3,163
Indonesia 27% (17%-38%) 18,010 1,191
Iran 36% (29%-43%) 122,492 7,057
Iraq 57% (29%-93%) 3,554 127
Ireland 42% (29%-57%) 24,200 1,547
Isle of Man 23% (8.2%-72%) 335 24
Israel 81% (62%-99%) 16,621 272
Italy 14% (11%-16%) 225,886 32,007
Japan 14% (10%-19%) 16,365 763
Jersey 13% (6.2%-29%) 303 27
Kazakhstan 97% (86%-100%) 6,751 35
Kenya 16% (9.6%-25%) 912 50
Kosovo 49% (28%-83%) 955 29
Kuwait 85% (64%-100%) 15,691 118
Kyrgyzstan 69% (34%-100%) 1,243 14
Latvia 50% (24%-93%) 1,009 19
Lebanon 58% (30%-98%) 931 26
Liberia 47% (11%-97%) 229 22
Lithuania 34% (19%-53%) 1,547 59
Luxembourg 51% (33%-71%) 3,947 107
Malaysia 95% (71%-100%) 6,941 113
Mali 20% (13%-31%) 874 52
Mauritius 62% (16%-100%) 332 10
Mexico 7.2% (5.8%-8.5%) 51,633 5,332
Moldova 29% (21%-38%) 6,138 215
Morocco 96% (82%-100%) 6,952 192
Netherlands 16% (13%-20%) 44,141 5,694
New Zealand 47% (23%-86%) 1,153 21
Niger 16% (7.2%-29%) 909 55
Nigeria 32% (22%-44%) 6,175 191
North Macedonia 26% (17%-41%) 1,817 104
Norway 38% (17%-66%) 8,249 233
Oman 95% (81%-100%) 5,379 26
Pakistan 48% (37%-59%) 43,966 939
Panama 51% (37%-68%) 9,726 279
Paraguay 84% (51%-100%) 788 11
Peru 37% (30%-45%) 94,933 2,789
Philippines 19% (14%-25%) 12,718 831
Poland 25% (19%-31%) 18,885 936
Portugal 41% (32%-51%) 29,209 1,231
Puerto Rico 36% (23%-55%) 2,710 124
Qatar 91% (59%-100%) 33,969 15
Romania 18% (14%-23%) 17,036 1,107
Russia 91% (80%-99%) 290,678 2,722
San Marino 78% (41%-100%) 654 41
Saudi Arabia 99% (93%-100%) 57,345 320
Senegal 72% (42%-99%) 2,544 26
Serbia 88% (62%-100%) 10,699 231
Sierra Leone 11% (6.3%-19%) 519 33
Singapore 94% (70%-100%) 28,343 22
Sint Maarten 12% (4%-33%) 77 15
Slovakia 66% (38%-98%) 1,495 28
Slovenia 17% (11%-27%) 1,466 104
Somalia 35% (19%-61%) 1,455 57
South Africa 42% (33%-54%) 16,433 286
South Korea 47% (20%-83%) 11,078 263
Spain 14% (11%-17%) 231,606 27,709
Sudan 20% (13%-30%) 2,591 105
Sweden 10% (8.1%-12%) 30,377 3,698
Switzerland 23% (18%-29%) 30,514 1,602
Thailand 75% (50%-99%) 3,031 56
Tunisia 57% (27%-98%) 1,043 46
Turkey 65% (53%-79%) 150,593 4,171
Ukraine 36% (23%-49%) 18,616 535
United Arab Emirates 92% (75%-100%) 24,190 224
United Kingdom 16% (13%-19%) 246,406 34,796
United Republic of Tanzania 46% (24%-83%) 509 21
United States of America 20% (16%-24%) 1,508,598 90,353
Uruguay 45% (21%-86%) 737 20
Uzbekistan 91% (67%-100%) 2,802 13
Venezuela 82% (45%-100%) 618 10

Table 1: Estimates for the proportion of symptomatic cases reported in different countries using cCFR estimates based on case and death timeseries data from the ECDC. Total cases and deaths in each country is also shown. Confidence intervals calculated using an exact binomial test with 95% significance.

Adjusting for outcome delay in CFR estimates

During an outbreak, the naive CFR (nCFR), i.e. the ratio of reported deaths date to reported cases to date, will underestimate the true CFR because the outcome (recovery or death) is not known for all cases [5]. We can therefore estimate the true denominator for the CFR (i.e. the number of cases with known outcomes) by accounting for the delay from confirmation-to-death [1].

We assumed the delay from confirmation-to-death followed the same distribution as estimated hospitalisation-to-death, based on data from the COVID-19 outbreak in Wuhan, China, between the 17th December 2019 and the 22th January 2020, accounting right-censoring in the data as a result of as-yet-unknown disease outcomes (Figure 1, panels A and B in [7]). The distribution used is a Lognormal fit, has a mean delay of 13 days and a standard deviation of 12.7 days [7].

To correct the CFR, we use the case and death incidence data to estimate the proportion of cases with known outcomes [1,6]:

\[ u_{t} = \frac{ \sum_{j = 0}^{t} c_{t-j} f_j}{c_t}, \]

where \(u_t\) represents the underestimation of the proportion of cases with known outcomes [1,5,6] and is used to scale the value of the cumulative number of cases in the denominator in the calculation of the cCFR, \(c_{t}\) is the daily case incidence at time, \(t\) and \(f_t\) is the proportion of cases with delay of \(t\) between confirmation and death.

Approximating the proportion of symptomatic cases reported

At this stage, raw estimates of the CFR of COVID-19 correcting for delay to outcome, but not under-reporting, have been calculated. These estimates range between 1% and 1.5% [1–3]. We assume a CFR of 1.4% (95% CrI: 1.2-1.7%), taken from a recent large study [3], as a baseline CFR. We use it to approximate the potential level of under-reporting in each country. Specifically, we perform the calculation \(\frac{1.4\%}{\text{cCFR}}\) of each country to estimate an approximate fraction of cases reported.

Temporal variation model fitting

We estimate the level of under-reporting on every day for each country that has had more than ten deaths. We then fit a Gaussian Process (GP) model using the library greta and greta.gp. The parameters we fit and their priors are the following: \[ \begin{aligned} &\sigma \sim \text{Log Normal(-1, 1)}: \quad &\text{Variance of the reporting kernel} \\ &\text{L} \sim \text{Log Normal(4, 0.5)}: \quad &\text{Lengthscale of the reporting kernel} \\ &\sigma_{\text{obs}} \sim \text{Truncated Normal(0, 0.5)}, \quad &\text{Variance of the obseration kernel, truncated at 0} \end{aligned} \] The kernel is split into two components: the reporting kernel \(R\), and the observation kernel \(O\). The reporting component has a standard squared-exponential form. For the observation component, we use an i.i.d. noise kernel to acccount for observation overdispersion, which can smooth out overly clumped death time-series. This is important as some countries have been known to report an unusually large number of deaths on a single day, due to past under-reporting.

In the sampling and fitting process, we calculate the expected number of deaths at each time-point, given the baseline CFR. We then use a Poisson likelihood, where the expected number of deaths is the rate of the Poisson likelihood, given the observed number of deaths

Adjusting case counts for under-reporting

We adjust the reported number of cases each day, pulled from the ECDC. Specifically, we divide the case numbers of each day by our “proportion of cases reported” estimates that we calculate each day for each country.*

Limitations

Implicit in assuming that the under-reporting is \(\frac{1.4\%}{\text{cCFR}}\) for a given country is that the deviation away from the assumed 1.4% CFR is entirely down to under-reporting. In reality, burden on healthcare system is a likely contributing factor to higher than 1.4% CFR estimates, along with many other country specific factors.

The following is a list of the other prominent assumptions made in our analysis:

Code and data availability

The code is publically available at https://github.com/thimotei/CFR_calculation. The data required for this analysis is a time-series for both cases and deaths, along with the corresponding delay distribution. We scrape this data from ECDC, using the NCoVUtils package [8].

Acknowledgements

The authors, on behalf of the Centre for the Mathematical Modelling of Infectious Diseases (CMMID) COVID-19 working group, wish to thank DSTL for providing the High Performance Computing facilities and associated expertise that has enabled these models to be prepared, run and processed and in an appropriately-rapid and highly efficient manner.

References

1 Russell TW, Hellewell J, Jarvis CI et al. Estimating the infection and case fatality ratio for covid-19 using age-adjusted data from the outbreak on the diamond princess cruise ship. medRxiv 2020.

2 Verity R, Okell LC, Dorigatti I et al. Estimates of the severity of covid-19 disease. medRxiv 2020.

3 Guan W-j, Ni Z-y, Hu Y et al. Clinical characteristics of coronavirus disease 2019 in china. New England Journal of Medicine 2020.

4 Shim E, Mizumoto K, Choi W et al. Estimating the risk of covid-19 death during the course of the outbreak in korea, february-march, 2020. medRxiv 2020.

5 Kucharski AJ, Edmunds WJ. Case fatality rate for ebola virus disease in west africa. The Lancet 2014;384:1260.

6 Nishiura H, Klinkenberg D, Roberts M et al. Early epidemiological assessment of the virulence of emerging infectious diseases: A case study of an influenza pandemic. PLoS One 2009;4.

7 Linton NM, Kobayashi T, Yang Y et al. Incubation period and other epidemiological characteristics of 2019 novel coronavirus infections with right truncation: A statistical analysis of publicly available case data. Journal of Clinical Medicine 2020;9:538.

8 Abbott S MJ Hellewell J. NCoVUtils: Utility functions for the 2019-ncov outbreak. doi:105281/zenodo3635417 2020.